In this problem we will look into Newton's method. Newton's method is the dynamical system defined by the update process:
$$x_{n+1} = x_n - \left(\frac{dg}{dx}(x_n)\right)^{-1} g(x_n)$$For these problems, assume that $\frac{dg}{dx}$ is non-singular.
Show that if $x^\ast$ is a steady state of the equation, then $g(x^\ast) = 0$.
Because $x^*$ is assumed to be the steady-state of the system, we know that
$$ x^*_{n+1} - x^*_n = 0 $$or, equivalently that the system is stable near $x^*$ and does not change. Therefore the latter term must be zero:
$$ \left( \frac{dg}{dx}(x_n) \right)^{-1} g(x_n) = 0 $$Because the derivative term $\frac{dg}{dx}$ is non-singular and therefore invertible, $\frac{dg}{dx}(x^*)$ cannot be 0. Therefore, $g(x*) = 0$ must be true.
Take a look at the Quasi-Newton approximation:
$$x_{n+1} = x_n - \left(\frac{dg}{dx}(x_0)\right)^{-1} g(x_n)$$for some fixed $x_0$. Derive the stability of the Quasi-Newton approximation in the form of a matrix whose eigenvalues need to be constrained. Use this to argue that if $x_0$ is sufficiently close to $x^\ast$ then the steady state is a stable (attracting) steady state.
We will assume that $\frac{dg}{dx}$ is continuous. $x_n$ is a vector $x_n \in R^k$. Because $x_0$ is fixed, we have the following:
$$ \frac{dg}{dx}(x_0) = A $$Where $A$ is a constant (precomputable) $R^{k \times k}$ matrix. Our update step looks like this:
$$ x_{n+1} = x_n - A^{-1}g(x_n) $$This is a dynamical system, similar to the multivariable case described in the lecture notes. $g(x_n)$ is a discrete mapping, and this system is diagonalizable with some matrix $B$ that is related to $A^{-1}g(x_n)$ We know that if all of the eigenvalues of this system $B$ are in the unit circle then $x_n \to 0$.
For some $x^*$ steady state, if we want the state to be stable then the system must be a contraction mapping around the linearization point. Because we know that $x^*$ is a steady-state and that the system converges to 0 near it (assuming all the eigenvalues of the linearized system are in the unit circle), we can say
$$ x_{n+1} = x^* - A^{-1}g(x^*) \to 0 $$$$ A^{-1}g(x^*) \approx x^* $$This implies that there exists a neighborhood around $x^*$ with convergence to the steady state. If $x_0$ is in this neighborhood, then the steady-state $x^*$ is stable.
Relaxed Quasi-Newton is the method:
$$x_{n+1} = x_n - \alpha \left(\frac{dg}{dx}(x_0)\right)^{-1} g(x_n)$$Argue that for some sufficiently small $\alpha$ that the Quasi-Newton iterations will be stable if the eigenvalues of $(\left(\frac{dg}{dx}(x_0)\right)^{-1} g(x_n))^\prime$ are all positive for every $x$.
(Technically, these assumptions can be greatly relaxed, but weird cases arise. When $x \in \mathbb{C}$, this holds except on some set of Lebesgue measure zero. Feel free to explore this.)
As in the previous part, for stability we require the eigenvalues to be in the unit circle. The given matrix
$$ (\left(\frac{dg}{dx}(x_0)\right)^{-1} g(x_n))' $$Is the Hessian of the system, or the second derivative. If it has all positive eigenvalues then it can be assumed to be postiive definite (because we're assuming continuous (partial) derivatives, including the second). A positive definite Hessian at $x_0$ is indicative of a local minimum near $x_0$.
The $\alpha$ term can perturb the signal, but since this is a local minimum a small enough $\alpha$ will mean a stable system near the linearization point.
Fixed point iteration is the dynamical system
$$x_{n+1} = g(x_n)$$which converges to $g(x)=x$.
To make $g(x) = 0$ the steady state, we simply add an $x$ term:
$$ x_{n+1} = x_n - g(x_n) $$Therefore, when $g(x) = 0$ the system converges to steady-state. This is identical to the Quasi-Newton method we've worked with in previous sections.
If we set the following:
$$ \left( \frac{dg}{dx}(x_0) \right)^{-1} = I_3 = 1 $$Then we get the same result between Quasi-Newton and fixed-point iteration. The eigenvalues of this "Jacobian" are all 1, so the fixed-point iteration loses the stability guarantees of Newton's method. However, you don't have to evaluate the Jacobian. As this is a linear-time operation, using fixed-point gives greater speed despite worse stability.
It's important to note that from this it is clear that Newton's method is a special case of fixed-point iteration. the Newton update step can be considered the system $g(x_n)$ in the fixed-point system. This means that all properties of fixed-point systems are true of Newton's method as well.
In this problem we will practice writing fast and type-generic Julia code by producing an algorithm that will compute the quantile of any probability distribution.
Many problems can be interpreted as a rootfinding problem. For example, let's take a look at a problem in statistics. Let $X$ be a random variable with a continuous distribution function (CDF) of $cdf(x)$. Recall that the CDF is a monotonically increasing function in $[0,1]$ which is the total probability of $X < x$. The $y$th quantile of $X$ is the value $x$ at with $X$ has a y% chance of being less than $x$. Interpret the problem of computing an arbitrary quantile $y$ as a rootfinding problem, and use Newton's method to write an algorithm for computing $x$.
(Hint: Recall that $cdf^{\prime}(x) = pdf(x)$, the probability distribution function.)
For a given continuous distribution function $cdf(x)$, we can find the root of the distribution that yields the quantile $y$ using Newton's Method. The algorithm uses the usual Newton update step:
$$ x_{n+1} = x_n - \frac{f(x)}{f'(x)} $$Where $f(x) = cdf(x)$ and $f'(x) = cdf'(x) = pdf(x)$, or the probability distribution function.
"""my_quantile pseudocode
computes the quantile y of continuous distribution function d
@param y: percentile
@param d: Univariate Distribution
"""
function my_quantile(y,d):
x = 0
x_m1 = x
while(true):
f = cdf(d, x) - y
fprime = pdf(d, x)
x_m1 = x
x = x - f/fprime
if abs(x - x_m1) < 1e-6:
break
return x
Use the types from Distributions.jl to write a function
my_quantile(y,d) which uses multiple dispatch to compute the
$y$th quantile for any UnivariateDistribution d from Distributions.jl.
Test your function on Gamma(5, 1), Normal(0, 1), and Beta(2, 4) against
the Distributions.quantile function built into the library.
(Hint: Have a keyword argument for $x_0$, and let its default be the mean or median of the distribution.)
using Distributions, Test, BenchmarkTools
function my_quantile(
y::Float64,
d::Distributions.UnivariateDistribution,
x₀::Float64 = Distributions.median(d),
max_iter::Int = 1000,
abstol::Float64 = 1e-6)
x = 0.0
iter = 0
while (true) && (iter < max_iter)
# Update step
f = Distributions.cdf(d, x₀) - y
fprime = Distributions.pdf(d, x₀)
x = x₀ - (f/fprime)
iter += 1
# Convergence check
if abs(x - x₀) < abstol
break
end
x₀ = x
end
return x
end
my_quantile (generic function with 4 methods)
ps = 0.1:0.1:0.99
@testset "Gamma Distributions" begin
g = Distributions.Gamma(5, 1)
for p in ps
expected = Distributions.quantile(g, p)
actual = my_quantile(p, g)
@test abs(actual - expected) < 1e-6
end
end;
@testset "Normal Distributions" begin
g = Distributions.Normal(0, 1)
for p in ps
expected = Distributions.quantile(g, p)
actual = my_quantile(p, g)
@test abs(actual - expected) < 1e-6
end
end;
@testset "Beta Distributions" begin
g = Distributions.Beta(2, 4)
for p in ps
expected = Distributions.quantile(g, p)
actual = my_quantile(p, g)
@test abs(actual - expected) < 1e-6
end
end;
Test Summary: | Pass Total Gamma Distributions | 9 9 Test Summary: | Pass Total Normal Distributions | 9 9 Test Summary: | Pass Total Beta Distributions | 9 9
g = Normal(0, 1)
p = 0.01
@btime my_quantile(p, g);
@btime Distributions.quantile(g, p);
564.607 ns (1 allocation: 16 bytes) 57.244 ns (1 allocation: 16 bytes)
In this problem we will write code for efficient generation of the bifurcation diagram of the logistic equation.
The logistic equation is the dynamical system given by the update relation:
$$x_{n+1} = rx_n (1-x_n)$$where $r$ is some parameter. Write a function which iterates the equation from
$x_0 = 0.25$ enough times to be sufficiently close to its long-term behavior
(400 iterations) and samples 150 points from the steady state attractor
(i.e. output iterations 401:550) as a function of $r$, and mutates some vector
as a solution, i.e. calc_attractor!(out,f,p,num_attract=150;warmup=400).
Test your function with $r = 2.9$. Double check that your function computes the correct result by calculating the analytical steady state value.
For our purposes, input parameter $f$ will be the update function $f(x) = x(1-x)$. $p$ will represent the parameters. In the univariate case, this is $p = r$.
function calc_attractor!(out,
f,
p,
x₀ = 0.25,
num_attract::Int = 150,
warmup::Int = 400)
x = x₀
# Warmup round; get to long-term behavior
for i = 1:warmup
x = p * f(x)
end
# Sampling round
for i = 1:num_attract
x = p .* f(x)
out[i] = x
end
end;
There is generally no closed-form solution to the logistic difference equation. From this source we have the result
$$ x = 0.655 $$When $r = 2.9$. We can check whether the function works:
r = 2.9
f(x) = x*(1.0 - x)
res = Vector{Float64}(undef, 150)
calc_attractor!(res, f, r)
@show res[end];
res[end] = 0.6551724137931038
Our result matches closely the one obtained by the source.
The bifurcation plot shows how a steady state changes as a parameter changes.
Compute the long-term result of the logistic equation at the values of
r = 2.9:0.001:4, and plot the steady state values for each $r$ as an
r x steady_attractor scatter plot. You should get a very bizarrely awesome
picture, the bifurcation graph of the logistic equation.
(Hint: Generate a single matrix for the attractor values, and use calc_attractor!
on views of columns for calculating the output, or inline the calc_attractor!
computation directly onto the matrix, or even give calc_attractor! an input
for what column to modify.)
using Plots
ps = 2.9:0.01:4
num_attract = 150 # hard-coded to match calc_attractor! default
res = Matrix{Float64}(undef, num_attract, length(ps))
function bifur_serial(f, ps)
for i = 1:length(ps)
calc_attractor!(view(res,:,i), f, ps[i])
end
end
@btime bifur_serial(f, ps)
265.900 μs (555 allocations: 15.61 KiB)
# Build bifurcation diagram
n = num_attract
L = length(ps)
x = Vector{Float64}(undef, n*L)
y = copy(x)
for j in 1:L
x[(1 + (j-1)*n):j*n] .= ps[j]
y[(1 + (j-1)*n):j*n] .= res[:,j]
end
scatter(x, y, legend=false, markersize=1)
title!("Bifurcation Diagram")
xlabel!("r Parameter Value")
ylabel!("x")
Multithread your bifurcation graph generator by performing different steady state calcuations on different threads. Does your timing improve? Why? Be careful and check to make sure you have more than 1 thread!
Threads.nthreads()
4
res = Matrix{Float64}(undef, num_attract, length(ps))
function bifur_multithread(f, ps)
Threads.@threads for i = 1:length(ps)
calc_attractor!(view(res,:,i), f, ps[i])
end
end
@btime bifur_multithread(f, ps)
78.800 μs (577 allocations: 18.70 KiB)
# Build bifurcation diagram
n = num_attract
L = length(ps)
x = Vector{Float64}(undef, n*L)
y = copy(x)
for j in 1:L
x[(1 + (j-1)*n):j*n] .= ps[j]
y[(1 + (j-1)*n):j*n] .= res[:,j]
end
scatter(x, y, legend=false, markersize=1)
title!("Bifurcation Diagram")
xlabel!("r Parameter Value")
ylabel!("x")
The timing does improve, because the parameter search is split over multiple threads. On my machine it is 4. The speedup is roughly 3.5x faster, close to the theoretical limit of 4x given 4 threads.
Multiprocess your bifurcation graph generator first by using pmap, and then
by using @distributed. Does your timing improve? Why? Be careful to add
processes before doing the distributed call.
(Note: You may need to change your implementation around to be allocating differently in order for it to be compatible with multiprocessing!)
using Distributed;
@everywhere using SharedArrays;;
addprocs(4);
@everywhere function calc_attractor!(out,
f,
p,
x₀ = 0.25,
num_attract::Int = 150,
warmup::Int = 400)
x = x₀
# Warmup round; get to long-term behavior
for i = 1:warmup
x = p * f(x)
end
# Sampling round
for i = 1:num_attract
x = p .* f(x)
out[i] = x
end
end;
@everywhere f(x) = x*(1.0 - x)
@everywhere ps = 2.9:0.01:4
@everywhere num_attract = 150
res = SharedMatrix{Float64}(num_attract, length(ps))
@btime pmap(i -> calc_attractor!(view(res,:,i), f, ps[i]), 1:length(ps));
43.230 ms (9982 allocations: 369.70 KiB)
# Build bifurcation diagram
n = num_attract
L = length(ps)
x = Vector{Float64}(undef, n*L)
y = copy(x)
for j in 1:L
x[(1 + (j-1)*n):j*n] .= ps[j]
y[(1 + (j-1)*n):j*n] .= res[:,j]
end
scatter(x, y, legend=false, markersize=1)
title!("Bifurcation Diagram")
xlabel!("r Parameter Value")
ylabel!("x")
res = SharedMatrix{Float64}(num_attract, length(ps))
function bifur_distributed(f, ps)
out = @sync @distributed (hcat) for i in 1:length(ps)
calc_attractor!(view(res,:,i), f, ps[i])
end
return out
end
@btime res = bifur_distributed(f, ps);
2.545 ms (540 allocations: 25.97 KiB)
# Build bifurcation diagram
n = num_attract
L = length(ps)
x = Vector{Float64}(undef, n*L)
y = copy(x)
for j in 1:L
x[(1 + (j-1)*n):j*n] .= ps[j]
y[(1 + (j-1)*n):j*n] .= res[:,j]
end
scatter(x, y, legend=false, markersize=1)
title!("Bifurcation Diagram")
xlabel!("r Parameter Value")
ylabel!("x")
Timing does not improve relative to the multithreaded case. @distributed is near the same order of magnitude as the serial case, likely because it is a static scheduler. pmap is far worse, as it is a dynamic scheduler with additional overhead. Neither is better than the multithreaded case because the additional overhead required to do multiprocessing is unjustified given the type/size of the problem.
Which method is the fastest? Why?
Multithreading is the fastest because the problem's size isn't conducive to multiprocessing, and because parameter searching is well-suited for splitting over multiple threads in a threadsafe way.